Automated extraction of product comparison matrices from informal product descriptions
نویسندگان
چکیده
Domain analysts, product managers, or customers aim to capture the important features and differences among a set of related products. A case-by-case reviewing of each product description is a laborious and time-consuming task that fails to deliver a condense view of a family of product. In this article, we investigate the use of automated techniques for synthesizing a product comparison matrix (PCM) from a set of product descriptions written in natural language. We describe a tool-supported process, based on term recognition, information extraction, clustering, and similarities, capable of identifying and organizing features and values in a PCM – despite the informality and absence of structure in the textual descriptions of products. We evaluate our proposal against numerous categories of products mined from BestBuy. Our empirical results show that the synthesized PCMs exhibit numerous quantitative, comparable information that can potentially complement or even refine technical descriptions of products. The user study shows that our automatic approach is capable of extracting a significant portion of correct features and correct values. This approach has been implemented in MatrixMiner a web environment with an interactive support for automatically synthesizing PCMs from informal product descriptions. MatrixMiner also maintains traceability with the original descriptions and the technical specifications for further refinement or maintenance by users. Preprint submitted to JSS January 5, 2017
منابع مشابه
Attributes Extraction from Product Descriptions on e-Shops
Some e-shops present product attributes in structured form, but many others use the textual description only. Attributes of products are essential in automated product deduplication. We suggest methods for automated extraction of attributes and their values from product descriptions to a structural form. The structural data extracted from other e-shops are used as background knowledge.
متن کاملThe WDC Gold Standards for Product Feature Extraction and Product Matching
Finding out which e-shops offer a specific product is a central challenge for building integrated product catalogs and comparison shopping portals. Determining whether two offers refer to the same product involves extracting a set of features (product attributes) from the web pages containing the offers and comparing these features using a matching function. The existing gold standards for prod...
متن کاملSemi-Supervised Learning to Extract Attribute-Value Pairs from Product Descriptions on the Web
We describe an approach to extract attribute-value pairs from product descriptions on the Web. The goal is to augment product databases by representing each product as a set of such attribute-value pairs. Such a representation is useful for a variety of tasks where treating the product as a set of attribute-value pairs is more useful than as an atomic entity. Examples include product recommenda...
متن کاملCS 224N Final Project: Automated extraction of product attributes from reviews
Over the past few years, there has been huge growth on the internet in the field of online reviews of products. These reviews serve as means of assessing the general outlook of a product to the potential buyers. Due to the very “informal” and unstructured setting in which these reviews are obtained, each review might not consider all the features of the product. Also, with increasing number of ...
متن کاملSemi-Supervised Learning of Attribute-Value Pairs from Product Descriptions
We describe an approach to extract attribute-value pairs from product descriptions. This allows us to represent products as sets of such attribute-value pairs to augment product databases. Such a representation is useful for a variety of tasks where treating a product as a set of attribute-value pairs is more useful than as an atomic entity. Examples of such applications include product recomme...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Systems and Software
دوره 124 شماره
صفحات -
تاریخ انتشار 2017